Rotated Canonical Correlation Analysis for Multilingual Corpora
نویسندگان
چکیده
This paper aims at proposing the joint use of Canonical Correlation Analysis and Procrustes Rotations (RCA), when we deal with a text and its translation into another language. The basic idea is representing words in the two different natural languages on a common reference space. The main characteristic of this space is to be language independent, although Procrustes Rotation is performed transforming the lexical table derived from translation by minimizing its distance from the lexical table belonging to the original corpus, while the subsequent Canonical Correlation Analysis treats symmetrically the two word sets. The most interesting RCA feature is building a unique reference space for representing the correlation structure in the data, inducing the two systems of canonical factors to lie on the same space. These graphical representations enables us to read distances between corresponding points in terms of different way of translating the same word in relation with the general context defined by the canonical variates. Trying to understand the distances between matched points could represent an useful tool for enriching lexical resources in a translation procedure. In this paper we propose the comparison of the most frequent content bearing words in the two languages, analyzing one year (2003) of Le Monde Diplomatique and its Italian edition.
منابع مشابه
Improving Vector Space Word Representations Using Multilingual Correlation
The distributional hypothesis of Harris (1954), according to which the meaning of words is evidenced by the contexts they occur in, has motivated several effective techniques for obtaining vector space semantic representations of words using unannotated text corpora. This paper argues that lexico-semantic content should additionally be invariant across languages and proposes a simple technique ...
متن کاملDeep Multilingual Correlation for Improved Word Embeddings
Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings from two languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of ...
متن کاملMultivariate Characterisation of Oulmes-Zaer and Tidili Cattle Using the Morphological Traits
Fourteen different morphological traits in 169 and 131 cattle of Oulmes-Zaer and Tidili, respectively were recorded and analyzed using a multivariate approach. The characters measured included heart girth, wither height, rump height, rump length, rump width, chest depth, body length, neck length, cannon circumference, ear length, ear width, head length, horn length and tail length. Breed signif...
متن کاملCanonical Correlation Analysis for Determination of Relationship between Morphological and Physiological Pollinated Characteristics in Five Varieties of Phalaenopsis
Phalaenopsis is an important genus of orchids that is grown for economical production of cut flower and potted plants. The objective of this study is the evaluation of correlation between morphological and physiological traits of self and cross-pollination of 5 varieties of Phalaenopsis orchid. Some morphological traits were measured: Capsule length (CL), capsule volume (CV), weight of seeds in...
متن کاملCanonical Analysis of the Relationship between Components of Professional Ethics and Dimensions of Social Responsibility
Background: Today, professional ethics and social responsibility play an important role in organizations. This study aimed canonical analysis of the relationship between components of professional ethics and social responsibility dimensions among the first high school teachers in the Naghadeh province. Method: This study, in terms of purpose is application, and in terms of data collec...
متن کامل